109 research outputs found

    Noise Models in Classification: Unified Nomenclature, Extended Taxonomy and Pragmatic Categorization

    Get PDF
    This paper presents the first review of noise models in classification covering both label and attribute noise. Their study reveals the lack of a unified nomenclature in this field. In order to address this problem, a tripartite nomenclature based on the structural analysis of existing noise models is proposed. Additionally, a revision of their current taxonomies is carried out, which are combined and updated to better reflect the nature of any model. Finally, a categorization of noise models is proposed from a practical point of view depending on the characteristics of noise and the study purpose. These contributions provide a variety of models to introduce noise, their characteristics according to the proposed taxonomy and a unified way of naming them, which will facilitate their identification and study, as well as the reproducibility of future research

    Noise simulation in classification with the noisemodel R package: Applications analyzing the impact of errors with chemical data

    Get PDF
    Classification datasets created from chemical processes can be affected by errors, which impair the accuracy of the models built. This fact highlights the importance of analyzing the robustness of classifiers against different types and levels of noise to know their behavior against potential errors. In this con- text, noise models have been proposed to study noise-related phenomenology in a controlled environment, allowing errors to be introduced into the data in a supervised manner. This paper introduces the noisemodel R package, which contains the first extensive implementation of noise models for classification datasets, proposing it as support tool to analyze the impact of errors related to chemical data. It provides 72 noise models found in the specialized literature that allow errors to be introduced in different ways in classes and attributes. Each of them is properly documented and referenced, unifying their results through a specific S3 class, which benefits from customized print, summary and plot methods. The usage of the package is illustrated through four applica- tion examples considering real-world chemical datasets, where errors are prone to occur. The software presented will help to deepen the understanding of the problem of noisy chemical data, as well as to develop new robust algo- rithms and noise preprocessing methods properly adapted to different types of errors in this scenario.University of Granada/CBU

    Impact of Regressand Stratification in Dataset Shift Caused by Cross-Validation

    Get PDF
    Data that have not been modeled cannot be correctly predicted. Under this assumption, this research studies how k-fold cross-validation can introduce dataset shift in regression problems. This fact implies data distributions in the training and test sets to be different and, therefore, a deterioration of the model performance estimation. Even though the stratification of the output variable is widely used in the field of classification to reduce the impacts of dataset shift induced by cross-validation, its use in regression is not widespread in the literature. This paper analyzes the consequences for dataset shift of including different regressand stratification schemes in cross-validation with regression data. The results obtained show that these allow for creating more similar training and test sets, reducing the presence of dataset shift related to cross-validation. The bias and deviation of the performance estimation results obtained by regression algorithms are improved using the highest amounts of strata, as are the number of cross-validation repetitions necessary to obtain these better results.MCIU/AEI/ERDF, UE PGC2018098860-B-I00ERDF Operational Programme 2014-2020Economy and Knowledge Council of the Regional Government of Andalusia, Spain MCIN/AEI CEX2020-001105-M A-FQM-345-UGR1

    Managing Borderline and Noisy Examples in Imbalanced Classification by Combining SMOTE with Ensemble Filtering

    Get PDF
    Imbalance data constitutes a great difficulty for most algorithms learning classifiers. However, as recent works claim, class imbalance is not a problem in itself and performance degradation is also associated with other factors related to the distribution of the data as the presence of noisy and borderline examples in the areas surrounding class boundaries. This contribution proposes to extend SMOTE with a noise filter called Iterative-Partitioning Filter (IPF), which can overcome these problems. The properties of this proposal are discussed in a controlled experimental study against SMOTE and its most well-known generalizations. The results show that the new proposal performs better than exiting SMOTE generalizations for all these different scenarios.Regional Projects P1O-TIC-06858 P11-TIC-9704 P12-TIC-2958 NCN-2013/11/B/5T6/00963National Project TIN2011-28488Spanish Governmen

    Modulation of coaxial modal interferometers based on long period gratings in double cladding fibers

    Get PDF
    This paper reports on the dynamic modulation of coaxial interferometers based on two cascaded long period gratings written in double cladding fibers. The interferometer is modulated by a piezoelectric ceramic which stretches one the gratings at tens of kHz, the output light is intensity modulated with an efficiency of 97 %. The device operates at 1530nm, has more than 50nm bandwidth, insertion loss of 0.4 dB and a temperature drift of 0.11 nm/ºC

    Decision-Tree-Based Approach for Pressure Ulcer Risk Assessment in Immobilized Patients

    Get PDF
    Applications where data mining tools are used in the fields of medicine and nursing are becoming more and more frequent. Among them, decision trees have been applied to different health data, such as those associated with pressure ulcers. Pressure ulcers represent a health problem with a significant impact on the morbidity and mortality of immobilized patients and on the quality of life of affected people and their families. Nurses provide comprehensive care to immobilized patients. This fact results in an increased workload that can be a risk factor for the development of serious health problems. Healthcare work with evidence-based practice with an objective criterion for a nursing professional is an essential addition for the application of preventive measures. In this work, two ways for conducting a pressure ulcer risk assessment based on a decision tree approach are provided. The first way is based on the activity and mobility characteristics of the Braden scale, whilst the second way is based on the activity, mobility and skin moisture characteristics. The results provided in this study endow nursing professionals with a foundation in relation to the use of their experience and objective criteria for quick decision making regarding the risk of a patient to develop a pressure ulcer.Consejeria de Salud, Junta de Andalucia (Fundacion Publica Andaluza Progreso y Salud) AP-0086-201

    DispoCen. Much more than a program about lexical availability

    Get PDF
    DispoCen es un sistema para el análisis de la disponibilidad y la centralidad léxica. Aunque existen programas específicos para el cálculo de los citados índices, estos suelen restringir en exceso las posibilidades de análisis y explotación de los datos, bien porque se trata de herramientas obsoletas, bien porque sus códigos son excesivamente cerrados e inaccesibles. DispoCen está basado en una librería de herramientas en R que pone al alcance de quienes estudian el léxico el desarrollo de múltiples aplicaciones y modelos originales. En este trabajo hemos incluido los códigos necesarios para ejecutar los análisis, con lo que potenciamos la necesaria replicabilidad que favorece el trabajo autónomo de la comunidad investigadora. Para facilitar el acceso al sistema, también presentamos una sencilla utilidad gráfica que permite el acceso a los análisis más usuales. Como muestra de las posibilidades de DispoCen, incluimos un apartado específico con propuestas de análisis realizadas con filtros sociológicos.DispoCen is a system for the analysis of availability and lexical centrality. Although there are specific programs for calculating the mentioned index, these tend to excessively restrict the possibilities of data analysis, either because they are obsolete tools, or because their codes are excessively closed and inaccessible. DispoCen is based on a library of tools in R that makes the development of multiple applications and original models to those who study the lexicon. In this paper we have included the necessary codes to run the analysis, thereby enhancing the necessary replicability that allows the autonomous work of the research community. To facilitate access to the system, we also present a simple graphical tool that facilitates access to the most common analyzes. As a sample of the possibilities of DispoCen, we include a specific section with proposals for analysis made with sociological ítems.Este trabajo ha sido posible gracias a la financiación y patrocinio del Ministerio de Ciencia, Innovación y Universidades al Proyecto de Investigación Agenda 2050. El español de Málaga: procesos de variación y cambio espaciales y sociales (PID2019-104982GB-C5-2)

    Un modelo de formación económico-social periférico en la banda atlántica de Cádiz

    Get PDF
    Exponemos un balance sobre la Edad del Bronce en San Fernando. Analizamos las bases arqueológicas para el estudio de las sociedades de mediados del II° milenio a.C. Se trata de comunidades periféricas, que habitan un medio insular, donde aprovechan recursos naturales, fundamentalmente malacológicos, pero que generan una importante agricultura, existiendo una clara relación de dependencia respecto a un centro nuclear ubicado en las campiñas interiores.We show a summary about Bronze Age in the island of San Fernando. Archaeological basis are analyzed to study the societies of the midle of the second millennium B.C. They are peripheric communities who live in the island, where they use natural resources, fundamentally molluscs, which generate an important agriculture, so there is a clear relation of dependence with regard to a nuclear center in the interior contryside

    Fiber-Optic Aqueous Dipping Sensor Based on Coaxial-Michelson Modal Interferometers

    Get PDF
    Fiber-optic modal interferometers with a coaxial-Michelson configuration can be used to monitor aqueous solutions by simple dipping of few centimeters of a fiber tip. The fabrication of these sensors to work around 850 nm enables the use of compact, robust, and low-cost optical spectrum analyzers. The use of this type of portable sensor system to monitor sewage treatment plants is shown

    Explanatory Models of Burnout Diagnosis Based on Personality Factors in Primary Care Nurses

    Get PDF
    Burnout in the primary care service takes place when there is a high level of interaction between nurses and patients. Explanatory models based on psychological and personality related variables provide an approximation to level changes in the three dimensions of the burnout syndrome. A categorical-response ordinal logistic regression model, based on a quantitative, crosscutting, multicentre, descriptive study with 242 primary care nurses in the Andalusian Health Service in Granada (Spain) is performed for each dimension. The three models included all the variables related to personality. The risk factor friendliness was significant at population level for the three dimensions, whilst openness was never significant. Neuroticism was significant in the models related to emotional exhaustion and depersonalization, whilst responsibility was significant for the models referred to depersonalization and personal accomplishment dimensions. Finally, extraversion was also significant in the emotional exhaustion and personal accomplishment dimensions. The analysis performed provides useful information, making more readily the diagnosis and evolution of the burnout syndrome in this collective.Junta de Andalucia P20_0062
    corecore